Genomic capacities for Reactive Oxygen Species metabolism across marine phytoplankton

Marine phytoplankton produce and scavenge Reactive Oxygen Species, to support cellular processes, while limiting damaging reactions. Some prokaryotic picophytoplankton have, however, lost all genes encoding scavenging of hydrogen peroxide. Such losses of metabolic function can only apply to Reactive Oxygen Species which potentially traverse the cell membrane outwards, before provoking damaging intracellular reactions. We hypothesized that cell radius influences which elements of Reactive Oxygen Species metabolism are partially or fully dispensable from a cell. We therefore investigated genomes and transcriptomes from diverse marine eukaryotic phytoplankton, ranging from 0.4 to 44 μm radius, to analyze the genomic allocations encoding enzymes metabolizing Reactive Oxygen Species. Superoxide has high reactivity, short lifetimes and limited membrane permeability. Genes encoding superoxide scavenging are ubiquitous across phytoplankton, but the fractional gene allocation decreased with increasing cell radius, consistent with a nearly fixed set of core genes for scavenging superoxide pools. Hydrogen peroxide has lower reactivity, longer intracellular and extracellular lifetimes and readily crosses cell membranes. Genomic allocations to both hydrogen peroxide production and scavenging decrease with increasing cell radius. Nitric Oxide has low reactivity, long intracellular and extracellular lifetimes and readily crosses cell membranes. Neither Nitric Oxide production nor scavenging genomic allocations changed with increasing cell radius. Many taxa, however, lack the genomic capacity for nitric oxide production or scavenging. The probability of presence of capacity to produce nitric oxide decreases with increasing cell size, and is influenced by flagella and colony formation. In contrast, the probability of presence of capacity to scavenge nitric oxide increases with increasing cell size, and is again influenced by flagella and colony formation.


The black queen hypothesis
The Black Queen Hypothesis states that loss of function mutations may proceed so long as some interacting community members retain the function, and the function can occur outside a given cell [83]. The Black Queen Hypothesis was formulated on the basis of Prochlorococcus, which lost the genes encoding enzymes which scavenge H 2 O 2 . Instead, Prochlorococcus allows H 2 O 2 outwards across the cell membrane to be dealt with by community members retaining the capacity to scavenge H 2 O 2 , thus saving Prochlorococcus the cost of maintaining the genes and metabolism for scavenging H 2 O 2 [83,84]. Growth and survival of Prochlorococcus indeed improves when co-cultured with 'helper' bacteria which carry genes for catalase [84][85][86][87].

Hypotheses and significance
Given that ROS show differential abilities to cross cell membranes, and have widely different diffusion distances before destruction [88], we sought to study whether cell radius, colony formation, flagella, or diatom cell shape influence genomic allocations to ROS production and scavenging across diverse marine phytoplankters.
Hypothesis 1 Cell radius across phytoplankton taxa does not influence the fraction of total gene content encoding O 2 •− production, nor scavenging. O 2 •− is highly toxic and does not readily cross biological membranes [17], so diffusional losses of O 2 •− from cells are limited, and cells need to retain capacity for detoxification of O 2 •− across cell sizes.
Hypothesis 2 Large phytoplankton allocate a smaller fraction of their total gene content to H 2 O 2 and • NO production and a larger fraction of their total gene content to H 2 O 2 and • NO scavenging. H 2 O 2 and • NO have relatively low reactivity, with long intracellular and extracellular lifetimes leading to long potential diffusion paths before destruction. Both H 2 O 2 and • NO are uncharged and readily cross cell membranes (Table 1). Large cells have longer intracellular diffusional paths and a lower surface to volume ratios than do smaller cells [1]. Large cells are thus less prone to diffusional losses of intracellular H 2 O 2 and • NO. To maintain H 2 O 2 and • NO homeostasis in the face of slower diffusional losses of H 2 O 2 or • NO out of the cells to the environment, large phytoplankton may have a smaller fraction of their gene contents encoding H 2 O 2 and • NO production. In contrast, loss of function mutations on enzymes that scavenge H 2 O 2 and • NO would be more deleterious in large cells than in smaller cells [83,84].
Hypothesis 3 Flagellated phytoplankton have a larger fraction of their total gene content encoding H 2 O 2 and • NO production, and a smaller fraction of their total gene content encoding H 2 O 2 and • NO scavenging. Increased motility in flagellated cells allows movement away from cytotoxic levels of H 2 O 2 and • NO, possibly complementing scavenging.
Hypothesis 4 Colony forming phytoplankton have a smaller fraction of their total gene content encoding H 2 O 2 and • NO production, and a larger fraction of their total gene content encoding H 2 O 2 and • NO scavenging. Cell spacing in colony forming phytoplankton is so small that the diffusional spheres of H 2 O 2 or • NO diffusing outwards from cells overlap with nearby cells [88], thereby shifting the requirements to maintain homeostasis within cells of a colony.
Hypothesis 5 Pennate Diatoms allocate a larger fraction of their total gene content to H 2 O 2 and • NO production, and a smaller fraction of their total gene content to H 2 O 2 and • NO scavenging than do Centric Diatoms. Pennate diatoms have a small minimum radii even at large biovolumes due to their elongated shape [89]. This cell shape of pennate diatoms allows for more diffusion of H 2 O 2 and • NO across the cell membrane due to the shorter mean diffusion paths to the cell surface and high surface area to volume ratio. To maintain homeostasis, pennate diatoms may have a larger fraction of their total gene content for H 2 O 2 and • NO production compared to centric diatoms. In contrast, pennate diatoms may have a smaller fraction of their gene content for H 2 O 2 and • NO scavenging, compared to centric diatoms.
Our work analyzed the fraction of the total genes in a genome or transcriptome associated with the metabolism of a particular ROS. The presence or absence of genes encoding specific ROS metabolizing enzymes may be caused by genetic drift, or may relate to a selective advantage linked to other metabolites of the same enzyme, rather than an enzymatic role in ROS metabolism, per se. Furthermore, the presence of a gene in a genome does not necessarily mean the encoded enzyme will be active, and closely related enzymes may mediate different activities in different organisms. The influence of non-enzymatic pathways such as carotenoids or tocopherols [42,90,91] likely affect the hypotheses listed above, but were beyond the frame of this study.

Methods
Data dictionary S1 Table contains a data dictionary of variable names used in our analysis, their definitions and locations in code and data objects.
We implemented an automated pipeline using Snakemake [100] to pass gene sequences from downloaded genomes or transciptomes, in.fasta format, to eggNOG-Mapper 2.0.6 [101,102] and then used the DIAMOND algorithm [103] and the eggNOG 5.0 database [104], to annotate potential orthologs in each analyzed genome or transcriptome, using the following parameters: seed_ortholog_evalue = 0.001, seed_ortholog_score = 60, tax_scope = "auto," go_evidence = "non-electronic," query_cover = 20 and subject_cover = 0. The annotation generated for each gene model included (when available): the name of the matching ortholog (coded by eggNOG as 'seed_eggNOG_ortholog'); E-value (coded by eggNOG as 'seed_ortho-log_evalue'); Score (coded by eggNOG as 'seed_ortholog_score'); EC number (coded by egg-NOG as 'EC'); Kegg Orthology (KO) number (coded by eggNOG as 'KEGG_ko'); Kegg Pathway (coded by eggNOG as 'KEGG_Pathway'); Kegg Module (coded by eggNOG as 'KEGG_Module'); Kegg Reaction (coded by eggNOG as 'KEGG_Reaction'); Kegg Reaction Class (coded by eggNOG as 'KEGG_rclass'); the predicted protein family (coded by eggNOG as 'PFAMs'); Gene Ontology (GO) annotation (coded by eggNOG as 'Gos'); as well as a description from eggNOG of the source organism of the matching ortholog (coded by egg-NOG as 'best_og_desc'). Note that comparison of sequences to the eggNOG 5.0 database generates non-supervised orthology annotations, and is subject to error if the underlying egg-NOG annotation was inaccurate, or for functionally divergent orthologous gene sequences. The output of automatically annotated orthologs, from each genome or transcriptome, from the bioinformatic pipeline was compiled into one file CombinedHits.csv (to be submitted to the DRYAD database to support alternate analyses) (Fig 1).
In parallel we assembled metadata from the literature and culture collection databases for each phytoplankter for which we obtained a genome or transcriptome; including the cell radii in μm from 100% of organisms (S1 Fig); colony formation for 84% of organisms; cell shape from diatoms from 100% of diatoms; presence or absence of flagella as an index of potential motility from 100% of organisms; the genome size from all genomes; the total number of predicted gene models from 97% of organisms; and the total number of nuclear genes encoding ribosomal components from 100% of organisms (S3 Table); all stored in CellGenomeMetrics. csv (submitted to the DRYAD database to support alternate analyses; doi.org/10.5061/dryad. kh1893284) (Fig 1). For organisms for which only transcriptomes were available, we only included datasets for which the total number of detected different transcripts was available, as a proxy for the total number of predicted genes. Strains of brackish origin were included but we did not include obligate freshwater strains in our analyses.
Citations were managed using the Zotero (www.zotero.org) open access reference manager connected to Rstudio using the 'citr' [128] package. The Zotero library of citations for this paper is available at (https://www.zotero.org/groups/2333131/ros_phytoplankton). We compared the Enzyme Commission Number (EC number) from CombinedHits to the BRENDA enzyme database [129]  •− production from superoxide oxidase was only documented in vitro with an excess of ubiquinone [130].
• D-amino-acid oxidase was removed from counts of genes encoding H 2 O 2 production, as the enzyme does not produce H 2 O 2 in vivo [131].
• Bacterial non heme ferritin is listed under H 2 O 2 production and scavenging as it produces H 2 O 2 in the first of a two-step reaction and scavenges H 2 O 2 in the second step [132].
From the subset of 'CombinedHits' of enzymes annotated for ROS metabolism, we grouped orthologs together by EC number and their Kegg Orthology number (KO number) and determined the occurrences of individual orthologs encoding each EC number, or KO number when EC number was not available, in a given organism. We merged this data subset with CellGenomeMetrics.csv to generate a dataset of genes encoding ROS metabolizing enzymes, as defined by the EC or KO number, along with characteristics of the source organism, combined into 'MergedData.' From the 'CombinedHits' data frame, we extracted and counted all genes annotated by eggNOG as ribosomal (Genes with the GO annotation 'GO:0005840'; coded as Ribosome_count), which we subsequently use as a proxy for housekeeping genes.
H 2 O 2 , O 2 •− and • NO differ in reactivity, stability, diffusion distance, effects on biomolecules and roles in cell signaling (Table 1). We therefore generated the total gene counts coding for the production or scavenging of each different ROS in a given organism, which were used to generate Poisson or Quasi-Poisson regressions (Fig 1). For • NO, we also ran Binomial probability models to infer the cell size at which organism has an equal probability of having (or not having) the genomic capacity to encode nitric oxide production or scavenging. These presence/absence analyses were not run for H 2

Data validation & justification of statistical analyses
Data from both genomes and transcriptomes were used in this analysis to gain wider representation from more taxa (S1 Fig). Data from the taxa with the largest radii were derived wholly from transcriptomes. Aside from the prokaryote genomes, sourced solely from within the 45n orth south latitude band, the sampled phytoplankton did not exhibit taxonomic biases in source latitude of isolation, but were primarily coastal (S2 Fig). For 40 organisms we had both genomic and transcriptomic data, which we used to test assumptions on data distributions (S3 Fig). As expected, data coverage from paired genomes and transcriptomes derived from the same organism correlated well. Therefore, both genomic and transcriptomic data were available from the same organism, we used genomic data in subsequent analyses (S3 Table), but we used data from transcriptomes when genomes were not available. We validated the gene annotations generated by the snakemake bioinformatic pipeline by comparing the total number of genes encoding ROS metabolism data from a subset of 'CombinedHits.csv' to the total number of genes encoding ROS metabolism data from a manually annotated dataset generated during a pilot project (S3 Fig) [133,134].
S5 Fig shows that  , H 2 O 2 or • NO within an organism to log 10 of the median cell radius in μm. Code used to produce the Poisson and Quasi-Poisson models is on https://github.com/FundyPhytoPhys/ ROS_bioinfo/tree/master/ROSGenomicPatternsAcrossMarinePhytoplankton.
Quasi-Poisson regressions were used when the Poisson regression was over-dispersed (dispersion > 1, p < 0.05) as determined by the 'AER' package [113]. A Poisson regression followed by a chi-squared test, or a Quasi-Poisson regression followed by an F-test, was used to obtain p-values [136], with an alpha value of �0.05 as the threshold for statistical significance of regressions; and a pseudo-R 2 was calculated using the McFadden R 2 method [137].
The total number of genes in each organism increased with the median cell radius, and also varied among the taxonomic lineages (coded as 'Phylum') ( Fig 2). Taxonomic lineage, in turn, interacts strongly with the median cell radius. For our analyses, we sought to detect effects of cell radius upon the fraction of total genes encoding ROS metabolism. We therefore included an offset of the total number of genes in the organism in the Poisson or Quasi-Poisson regressions, which is equivalent to normalizing the number of genes encoding the production or scavenging of H 2 O 2 , O 2 •− or • NO, to the total number of genes in the organism ('GeneModel_count'). We thereby offset the general increase in 'GeneModel_count' with increasing the median cell radius. Because of the strong interaction between the median cell radius and taxonomic lineage (S1 Fig), we did not include Phylum as a co-variate in our subsequent regressions of normalized gene counts vs. median cell radius. Thus, we did not analyze specific influences of Phylum upon gene counts for ROS metabolism. Poisson or Quasi-Poisson regressions were run both with or without 'Colony' and 'Flagella' as co-variates. The total number of ribosomal genes did not increase with median cell radius, but did vary with taxa (S6 Fig). Therefore, we also normalized the number of genes encoding the production or scavenging of H 2 O 2 , O 2 •− or • NO, to the total number of ribosomal genes in the organism ('Ribosome_count'), as a proxy for housekeeping genes. Because median cell radius and taxonomic lineage did not interact in this plot, we included Phylum, or 'Colony' and 'Flagella,' as co-variates in our Poisson or Quasi-Poisson regressions of normalized ribosomal gene counts vs. median cell radius.
To further investigate possible influences of colony formation, the presence of flagella or diatom cell shape (pennate or centric), independent of cell size, upon the fraction of genes that encode the metabolism of H 2 O 2 , O 2 •− or • NO, we used a Wilcoxon test [138] after binning data across all diatom sizes.

Superoxide
Although there are enzymes that specifically produce O 2 •− [139], in the marine phytoplankton genomes and transcriptomes that we analyzed, we did not detect any genes that encode for such enzymes (S2 Table), based on the BRENDA annotation. It is however worth noting the presence of genes annotated as encoding NADPH Oxidase (NOX) in some phytoplankton genomes. NOX can produce either H 2 O 2 or O 2 •− depending on the NOX isoform. NOX is included in our analyses as a H 2 O 2 producer, in accordance with the BRENDA annotation of the enzyme (S2 Table). Further analyses of the detected NOX isoforms might identify whether they include isoforms that produce O 2 •− . Sequences that are similar to Glutathione Reductase (GR) have been documented to produce enzymes that produce extracellular O 2 •− in the diatom Thalassiosira oceanica [139]. We found sequences annotated as GR across all phytoplankton genomes (S4 Table)   scavenging may emerge among the metallo-forms of SOD [144]. For example, in pilot runs discriminating among SOD metallo-forms we found that pico-prasinophytes encode Mn-SOD instead of the Fe-SOD encoded in genomes from larger green algal phytoplankters (Data not visualized) [133]. Differences between genomic patterns of pennate and centric diatoms may arise when comparing metallo-forms of SOD, noting that [145] found that pennate diatoms transcribe Cu/Zn-SOD but not Fe-SOD, whereas centric diatoms transcribe Fe-SOD more frequently than they transcribe Cu/Zn-SOD.

Hydrogen peroxide
All prokaryotic (S9 Fig) and eukaryotic (S10 Fig) phytoplankton, with the exception of a single transcriptome from the prasinophyte Micromonas polaris, have genes encoding H 2 O 2 producing enzymes, as they all carry gene(s) encoding the ubiquitous enzyme Superoxide Dismutase. Genes encoding oxidases producing H 2 O 2 include copropophyrinogen oxidase, found across all eukaryotic and prokaryotic phytoplankton, with the exception of one transcriptome. Genes encoding thiol oxidase and acyl CoA oxidase are also found in nearly all eukaryotic phytoplankton, with the exceptions of three transcriptomes. Genes encoding L-aspartate oxidase are found in nearly all prokaryotes, and all green algae, but are nearly absent from other eukaryotic

PLOS ONE
taxa. Sarcosine oxidase is not present in small diatoms and small green algae, but is present in nearly all dinoflagellates and haptophytes. (S)-2-hydroxy-acid oxidase (whose EC number includes glycolate oxidase) is found in most eukaryotic phytoplankton, but rarely in dinoflagellates.
The absence of catalase from most analyzed cyanobacterial genomes supports [146] who analyzed 44 different cyanobacterial genomes and found that only Nostoc punctiforme PCC73102 retained a full gene encoding catalase. In our analyses, only Synechococcus elongatus PCC11802 maintained a catalase encoding gene (S9 Fig). In the greens, catalase has been lost from the smaller prasinophytes but is maintained in the larger greens (S10 Fig). The loss of catalase from smaller green algae may be evidence of the Black Queen Hypothesis in action [83], in that H 2 O 2 can passively diffuse out of the smaller green algae but diffuses less out of larger Box spans median ± 1 quartile of the data. Whiskers span the range excluding outliers in the data. Citations for data sources can be found in S3 Table. https://doi.org/10.1371/journal.pone.0284580.g004 PLOS ONE green algae. Loss of function mutations in catalase encoding genes in small algae are therefore less deleterious than they would be to large green algae. Catalase, with a K M of~220 mM, may be poorly retained because the cells maintain some genomic capacity to scavenge H 2 O 2 using the enzymes ascorbate peroxidase, glutathione peroxidase and Cytochrome C peroxidase (S10 Fig), with K M in the low μM range [146].
Our results support an earlier suggestion that bigger genomic capacity for H 2 O 2 scavenging in Synechococcus compared to Prochlorococcus is a result of the larger size in Synechococcus compared to Prochlorococcus [84] (S9 Fig). It is however important to note the vast differences between prokaryotic and eukaryotic phytoplankton, with most eukaryotic phytoplankton, regardless of lineage, maintaining the genomic capacity to produce ascorbate peroxidase, glutathione peroxidase and Cytochrome C peroxidase (S10 Fig). Peroxidases are involved in pathways beyond simple ROS scavenging, including the Halliwell-Asada cycle for ascorbate peroxidase [147]. Ostreococcus, the smallest prasinophyte has a radius of 0.5 μm, comparable to that of the prokaryote Synechococcus (S3 Table), and would therefore share a similarly short diffusion path length. Nevertheless Ostreococcus, in common with other eukaryotes, retains genomic capacities to produce ascorbate peroxidase, glutathione peroxidase and cytochrome c peroxidase, which may thus reflect the cost of being eukaryotic (S10 Fig). With increasing cell radius, eukaryotic phytoplankton have a smaller fraction of their total genes encoding the production of H 2 O 2 ( Fig 5, Blue line, Slope = -3.4×10 −1 ± 5×10 −2 , pvalue = 9.6×10 −10 , pseudo-R 2 = 0.34). Including 'Flagella' and 'Colony' as co-variates did not significantly alter this pattern (Black line, 'Flagella' p-value = 8.4×10 −1 , 'Colony' pvalue = 4.7×10 −1 ). The pattern of a smaller fraction of total genes for H 2 O 2 production with increasing cell radius supports our Hypothesis 2 that larger phytoplankton counter decreasing diffusional loss of H 2 O 2 out of cells through a lower genomic capacity for H 2 O 2 production, whereas losses of genes encoding H 2 O 2 producing enzymes are more costly to small phytoplankton (Fig 5). [7] found that a major influence upon the capacity for production of H 2 O 2 is whether or not the organism can form blooms, with bloom forming species producing more H 2 O 2 . The ability to form blooms was not analyzed in our data as we did not find systematic information on potentials for bloom formation across taxa.
With increasing cell radius, eukaryotic phytoplankton also have a smaller fraction of their total genes encoding the capacity to scavenge H 2 O 2 (Fig 5, Blue line, Slope = -3.2×10 −1 ± 5.6×10 −2 , p-value = 1.4×10 −7 , pseudo-R 2 = 0.26). Including 'Flagella' and 'Colony' as co-variates did not influence the negative slope of the fraction of total genes encoding H 2 O 2 scavenging with increasing median cell radius (Fig 5, Black line, 'Flagella' p-value = 4.1×10 −1 , 'Colony' p-value = 1.6×10 −1 ). A parallel analysis focusing only on small phytoplankton such as picocyanobacteria and pico-prasinophytes might yield different results as more such genomes are sequenced, since [148] found that H 2 O 2 added to seawater at a concentration of 1.6 mg L -1 did not affect cells with a radius larger than 1 to 1.5 μm, but differentially harmed the picoprasinophyte Micromonas pusilla.
Because median cell radius co-varied with Taxa, we generally excluded Taxa as a co-variate from our regressions, in order to focus on any cross-taxon patterns driven by changing median cell radius. Nevertheless, representatives of the Ochrophyte Phylum alone spanned more than an order of magnitude in median cell radius. We therefore tested whether the log 10 (total number of genes encoding the metabolism of O 2 •− , H 2 O 2 or • NO) varied with the log 10 (median cell radius) across the Ochrophytes alone (S7 Fig). We found that across Ochrophytes, the fraction of total genes encoding the production of H 2 O 2 decreased with increasing cell radius (Slope = -1.6×10 −1 ± 9.4×10 −2 ), although the p-value for the regression was only 1×10 −1 ). This marginal decrease in the total number of genes encoding H 2 O 2 production with increasing median cell radius in Ochrophytes again tends to support our Hypothesis 2, with data from within a single phylum to limit confounding influences of diverse evolutionary histories and cell biologies upon patterns. H 2 O 2 production (Slope = -3.7×10 −1 ± 2×10 −1 , p-value = 6.7×10 −2 ) and scavenging (Slope = -3.5×10 −1 ± 2.1×10 −1 , p-value = 1×10 −1 ) allocations were steady with increasing cell size, relative to the ribosomal housekeeping gene proxy. But, genes for H 2 O 2 production and scavenging are diluted by increasing total gene counts with increasing cell size.
Pennate and centric diatoms do not show statistically significant differences in the fraction of their total gene content encoding the production (p-value = 1.9×10 −1 ) nor the scavenging of H 2 O 2 (p-value = 9.6×10 −2 ). Pennate and centric diatoms also do not show statistically significant differences in the ratio of genes encoding production (p-value = 3.3×10 −1 ) nor the scavenging of H 2 O 2 (p-value = 3.9×10 −1 ), normalized to their ribosomal gene content encoding H 2 O 2 . These results do not support our Hypothesis 5 that pennates have more genes encoding H 2 O 2 producing enzymes due to their higher surface area to volume ratio (Data not visualized).  Table. https://doi.org/10.1371/journal.pone.0284580.g005

Nitric oxide
In the genomes and transcriptomes that we analysed, Nitric Oxide Synthase (NOS, EC:1.14.13.39), although often absent, was the most frequently occurring • NO producing enzyme encoded (S11 Fig), but was not encoded, or at least not annotated, among prokaryotic phytoplankton (Data not visualized).
Non-enzymatic paths contribute to intracellular and extracellular • NO production [150], and may explain the absences of genes encoding • NO production from some genomes across taxonomic lineages. Alternately, • NO homeostasis may be achieved in some lineages by regulating the active cellular uptake and release of intracellular • NO, as has been recently demonstrated in humans [151]. Although NOD sequences have only been identified from phytoplankton through meta-transcriptomic analyses, in diatoms, haptophytes and dinoflagellates [152], there is limited understanding as to what may contribute to the active removal of • NO, and the lack of • NO scavenging genes across multiple phytoplankters. More research is needed on possible contributions of NOD to the active removal of • NO, as well as the NOS sequences detected in Synechococcus that also display NOD-like activity [149]. Perhaps the low toxicity of • NO does not warrant the active removal of • NO as long as the concentration does not exceed the toxic threshold. This explanation is plausible given that Platymonas helgolandica, Platymonas subcordiformis, Skeletonema costatum, Gymnodinium sp., and Prorocentrum donghaiense showed optimum growth in the presence of • NO concentrations between 10 −9 and 10 −6 mol L -1 [153], which are higher than the concentrations found in the ocean (Table 1).
A binomial model comparing the presence or absence of genes that encode the production of • NO shows no cell size effect (slope = -3.5×10 −1 , p = 1.4×10 −1 ). Including 'Flagella' as a covariate does not alter these results, but does show that flagellated phytoplankton have higher likelihood of presence of • NO production than do non-flagellated phytoplankton (S12 Fig, p = 5.6×10 −5 ). Including 'Colony' as a co-variate does not show a cell size effect, nor a difference in the likelihood of • NO production between colony and non-colony forming phytoplankton (p = 1.3×10 −1 ).
The larger fractional gene allocation to • NO production, and smaller fraction of genes that encode • NO scavenging enzymes, in centric diatoms (Fig 7) counters our hypothesis that diffusion from pennate diatoms would drive gene allocations in favor of • NO production (Hypothesis 5). Given the strong contrast in annotated • NO metabolism genes, it is likely that • NO has regulatory or signaling roles that vary systematically between pennate and centric diatoms, outside any diffusional influences. For example, • NO inhibits diatom adhesion to substrate [72,154]. Pennates are more likely to grow adhered in biofilms [155], which may explain the striking differences in total gene allocation to • NO production and scavenging. Alternately, [156] identified putative NOS sequences in the transcriptomes of three pennate diatom species (Pseudo-nitzschia arenysensis, Pseudo-nitzschia delicatissima and Pseudo-nitzschia multistriata), so it is possible the apparent lack of • NO producing sequences in pennates is due to errors in the unsupervised annotations from eggNOG. Notch spans ± standard error of the median. Box spans median ± 1 quartile of the data. Whiskers span the range excluding outliers in the data. Citations for data sources can be found in S3 Table. https://doi.org/10.1371/journal.pone.0284580.g007

Summary
We analyzed the fractions of the total genes in the genome that are associated with the metabolisms of three major ROS. It is important to note that the content of genes encoding specific ROS metabolizing enzymes may be caused by genetic drift, or may relate to a selective advantage linked to other metabolites of the same enzymes, rather than an enzyme role in ROS metabolism, per se. Furthermore, the gene presence or gene count in a genome is only one influence on the potential activity of the encoded enzyme, and closely related enzymes may confer different activities in different organisms.
The differential reactivities, diffusion distances, diffusibilities across cell membranes, and roles in cell signaling of H 2 O 2 , O 2 •− and • NO ( • NO has low reactivity, long intracellular and extracellular lifetimes and readily crosses cell membranes. Neither the fraction of the total genes for • NO production nor for scavenging changed significantly with increasing cell radius, consistent with relatively low cytotoxicity and roles of • NO in taxonomically diverse regulatory systems (contrary to hypothesis 5). Pennate diatoms frequently lack genes annotated as encoding • NO producing enzymes, whereas centric diatoms frequently lack genes annotated as encoding • NO scavenging enzymes (contrary to hypothesis 5). This finding is not explicable by differential diffusional losses of • NO, but may reflect distinct roles of • NO in the regulatory systems of diatom lineages.    Taxa'). Filled data points indicate that the data obtained from that organism was sourced from a genome, and unfilled data points were sourced from a transcriptome. The size of the symbol increases with the number of members of each enzyme found within each genome or transcriptome. Symbol absence means no sequences known to encode the enzyme family of interest were found in the target genome or transcriptome. The absence of transcripts encoding SOD from the Micromonas polaris transcriptome is likely due to low expression of SOD at the time that the mRNA was harvested for sequence analyses.